Using Reinforcement Learning for Proactive Network Fault Management
نویسندگان
چکیده
For high-speed networks, it is important that fault management be proactive–i.e., detect, diagnose, and mitigate problems before they result in severe degradation of network performance. Proactive fault management depends on monitoring the network to obtain the data on which to base manager decisions. However, monitoring introduces additional overhead that may itself degrade network performance especially when the network is in a stressed state. Thus, a tradeoff must be made between the amount of data collected and transferred on one hand, and the speed and accuracy of fault detection and diagnosis on the other hand. Such a tradeoff can be naturally formulated as a Partially Observable Markov Decision Process (POMDP) whose solution can be used to construct decision-rule for both centralized and distributed intelligent agents. Since the exact solution of POMDP’s for a realistic number of states is computationally prohibitive, we develop a reinforcement-learning-based fast algorithm which learns the decision-rule in an approximate network simulator and makes it fast deployable to the real network. Simulation results are given to diagnose a switch fault in an ATM network.
منابع مشابه
Proactive QoE Provisioning in Heterogeneous Access Networks using Hidden Markov Models and Reinforcement Learning
Quality of Experience (QoE) provisioning in heterogeneous access networks (HANs) can be achieved via handoffs. The current approaches for QoE-aware handoffs either lack the availability of a network path probing method or lack the availability of efficient methods for QoE prediction. Further, the current approaches do not explore the benefits of proactive QoE-aware handoffs such that user’s QoE...
متن کاملUsing Sliding Mode Controller and Eligibility Traces for Controlling the Blood Glucose in Diabetic Patients at the Presence of Fault
Some people suffering from diabetes use insulin injection pumps to control the blood glucose level. Sometimes, the fault may occur in the sensor or actuator of these pumps. The main objective of this paper is controlling the blood glucose level at the desired level and fault-tolerant control of these injection pumps. To this end, the eligibility traces algorithm is combined with the sliding mod...
متن کاملProvably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning
An ad hoc wireless network is an autonomous selforganizing system of mobile nodes connected by wireless links where nodes not in direct range communicate via intermediary nodes. Routing in ad hoc networks is a challenging problem as a result of highly dynamic topology as well as bandwidth and energy constraints. The Swarm Intelligence paradigm has recently been demonstrated as an effective appr...
متن کاملAutonomic Computer Network Defence Using Risk State and Reinforcement Learning
Computer Network Defence is concerned with the active protection of information technology infrastructure against malicious and accidental incidents. Given the growing complexity of IT systems and the speed at which automated attacks can be launched, implementing timely and efficient network incident mitigating actions, whether proactive or reactive, is a great challenge. We refer to the automa...
متن کاملEvaluation of PAMS' Adaptive Management Services
Management of large-scale parallel and distributed applications is an extremely complex task due to factors such as centralized management architectures, lack of coordination and compatibility among heterogeneous network management systems, and dynamic characteristics of networks and application bandwidth requirements. The development of an integrated network management framework that is proact...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999